Collect the data

Question 1

Build a graph to help the city determine when most buildings were constructed. Is there anything in the results that causes you to question the accuracy of the data? (note: only look at buildings built since 1850)

First, inspect the size and consistency of the data:

  • Row count: 577126
  • Range of building time: min:0, max:2040

The maximum date is in the future, and the minimum time is zero, so some of the records are incorrect. We’ll filter to the buildings built between 1850 and 2017 and count them in 10-year buckets:

Most buildings in the dataset were built after 1900, and there seems to be an unrealistic jump from 1.2k in 1890-1900 to 116.5k in 1900-1910. This is explained by the data dictionary accompanying the dataset:

Year Built is accurate for the decade but not necessarily for the specific year. Two outliers – 1910 & 1920. Structures built between 1800s and early 1900s usually have a Year Built date of either 1910 or 1920.

Question 2

Create a graph that shows how many buildings of a certain number of floors were built in each year. It should be clear when 20-story buildings, 30-story buildings, and 40-story buildings were first built in large numbers.

Inspect the NumFloors variable first:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   2.000   2.000   2.505   3.000 119.000   18788 

We see 18.7k records without an information on the number of floor and also some records with zero floors. However, the data dictionary explains this:

If the NUMBER OF FLOORS is zero and the NUMBER OF BUILDINGS is greater than zero, then NUMBER OF FLOORS is not available for the tax lot. If the NUMBER OF FLOORS is zero and the NUMBER OF BUILDINGS is zero, then NUMBER OF FLOORS is not applicable for the tax lot.

So we’ll only consider the records with a positive NumBldgs and a positive NumFloors:

As we can see from the chart, the first boom in 20 and 30-storey buildings began in 1920 - 1930, and the 40-storey and higher buildings came in larger numbers after 1960.

Question 3

Your boss suspects that buildings constructed during the US’s involvement in World War II (1941-1945) are more poorly constructed than those before and after. She thinks that, if you calculate assessed value per floor, you will see lower values for buildings at that time vs before or after. Construct a chart/graph to see if she’s right.

First we calculcate the assessed value of a building by subtracting the value of the land from the total assessed value:

As it would make sense to compare the building value only from the buildings from the same era, we’ll calculate and plot the mean assessed value per floor for the three intervals:

  • Buildings constructed in the ten years preceding the war
  • Buldings constructed during 1941-1945
  • Building in the ten years following the war

We can see that the boss was right, the buildings from the wartime have a significantly lower assessed value per floor than the ones constructed immediately before or after.

To put the figures in a broader context, visualize the metric for the complete dataset:

We observe the wartime drop in value discovered above, as well a dramatic jump in value per floor for the buildings constructed between 2010 and 2015.

We can see that there is one outlier in Brooklyn, what is it?

# A tibble: 1 x 3
              Address            OwnerName LandUse
               <fctr>               <fctr>   <int>
1 620 ATLANTIC AVENUE ARENA NOMINEE SUB B,       8

This is Barclays Center, which according to Wikipedia, is “a multi-purpose indoor arena in the New York City borough of Brooklyn. The arena is part of a $4.9 billion future business and residential complex now known as Pacific Park.”
This explains the high floor value.

If we filter the dataset only to the land use categories for residential buildings (LandUse categories 1 - 3 according to the codebook), we get a more balanced view on the floor value:

Bonus

An interactive map of the growth of New York: the tax lots from the dataset above are visualized in the order of building construction (opens in a new page):